Mask estimation incorporating time-frequency trajectories for a CASA-based ASR front-end

نویسندگان

Ji Hun Park

Jae Sam Yoon

Hong Kook Kim

چکیده

In this paper, we propose a mask estimation method for a computational auditory scene analysis (CASA) based speech recognition front-end using speech obtained from two microphones. The proposed mask estimation method incorporates the observation that the mask information should be correlated over contiguous analysis time frames and adjacent frequency channels. To this end, two different hidden Markov models (HMMs), time HMM and frequency HMM, representing the time and frequency trajectories respectively, are trained using features such as the interaural time difference and the interaural level difference of two-channel signals. A mask for the given timefrequency bin is estimated by combining the likelihoods estimated from the two HMMs, and used to separate the desired speech from noisy speech. To show the effectiveness of the proposed mask estimation, we first measure the root mean square error between the ideal mask and that estimated by the proposed method. Then, we compare the performance of a speech recognition system using the proposed mask estimation method to those using conventional methods. Consequently, the proposed method provides an average word error rate reduction of 63.2% and 3.1% when compared with the Gaussian kernel-based and time HMM-based mask estimation methods, respectively.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SNR-based mask compensation for computational auditory scene analysis applied to speech recognition in a car environment

In this paper, we propose a computational auditory scene analysis (CASA)–based front–end for two–microphone speech recognition in a car environment. One of the important issues associated with CASA is the accurate estimation of mask information for target speech separation within multiple microphone noisy speech. For such a task, the time–frequency mask information is compensated through the si...

متن کامل

Asr-driven Binary Mask Estimation for Robust Automatic Speech Recognition

Additive noise has long been an issue for robust automatic speech recognition (ASR) systems. One approach to noise robustness is the removal of noise information through segregation by binary time-frequency masks; each time-frequency unit in a spectro-temporal representation of the speech signal is labeled either noise-dominant or signal-dominant. The noise-dominant units are masked and their e...

متن کامل

CASA based speech separation for robust speech recognition

This paper introduces a speech separation system as a front-end processing step for automatic speech recognition (ASR). It employs computational auditory scene analysis (CASA) to separate the target speech from the interference speech. Specifically, the mixed speech is preprocessed based on auditory peripheral model. Then a pitch tracking is conducted and the dominant pitch is used as a main cu...

متن کامل

Mask estimation in non-stationary noise environments for missing feature based robust speech recognition

In missing feature based automatic speech recognition (ASR), the role of the spectro-temporal mask in providing an accurate description of the relationship between target speech and environmental noise is critical for minimizing the degradation in ASR word accuracy (WAC) as the signal-to-noise ratio (SNR) decreases. This paper demonstrates the importance of accurate characterization of instanta...

متن کامل

Optimization of Speech Enhancement Front-End with Speech Recognition-Level Criterion

This paper concerns the use of speech enhancement to improve automatic speech recognition (ASR) performance in noisy environments. Speech enhancement systems are usually designed separately from a back-end recognizer by optimizing the frontend parameters with signal-level criteria. Such a disjoint processing approach is not always useful for ASR. Indeed, timefrequency masking, which is widely u...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2008

Mask estimation incorporating time-frequency trajectories for a CASA-based ASR front-end

نویسندگان

چکیده

منابع مشابه

SNR-based mask compensation for computational auditory scene analysis applied to speech recognition in a car environment

Asr-driven Binary Mask Estimation for Robust Automatic Speech Recognition

CASA based speech separation for robust speech recognition

Mask estimation in non-stationary noise environments for missing feature based robust speech recognition

Optimization of Speech Enhancement Front-End with Speech Recognition-Level Criterion

عنوان ژورنال:

اشتراک گذاری